Sequence analysis MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
نویسندگان
چکیده
Summary: MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a timeand cost-efficient manner. It finished assembling a soil metagenomics dataset with 252 Gbps in 44.1 and 99.6 h on a single computing node with and without a graphics processing unit, respectively. MEGAHIT assembles the data as a whole, i.e. no pre-processing like partitioning and normalization was needed. When compared with previous methods on assembling the soil data, MEGAHIT generated a three-time larger assembly, with longer contig N50 and average contig length; furthermore, 55.8% of the reads were aligned to the assembly, giving a fourfold improvement. Availability and implementation: The source code of MEGAHIT is freely available at https://github. com/voutcn/megahit under GPLv3 license. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
منابع مشابه
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252 Gbps in 44.1 and 99.6 h on a single computing node with and without a graphics processing unit, respectively. MEGAHIT assembles the data as a whole, i.e. no pre-processing like partitioning and normalization was n...
متن کاملIntegrating long-range connectivity information into de Bruijn graphs.
Motivation The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously t...
متن کاملSuccinct data structures for assembling large genomes
MOTIVATION Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and f...
متن کاملFaucet: streaming de novo assembly graph construction
Motivation We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ...
متن کاملUtilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis
MOTIVATION Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory ch...
متن کامل